Semi-Automatic Identification of Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences

نویسندگان

  • Bing Liang
  • Takehito Utsuro
  • Mikio Yamamoto
چکیده

In the research field of machine translation of patent documents, the issue of acquiring technical term translation equivalent pairs automatically from parallel patent documents is one of those most important. We take an approach of utilizing the phrase table of a state-of-the-art phrase-based statistical machine translation model. In this task, we consider situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents. We apply SVM to the task of identifying synonymous translation equivalent pairs and achieve almost 98% precision and over 40% Fmeasure. Then, in order to improve recall, we introduce a semi-automatic framework, where we employ the strategy of selecting more than one seeds for each set of candidates bilingual synonymous term pairs. By manually judging whether each pair of two seeds is synonymous or not, we achieve over 95% precision and 50% recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Features for Identifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families

In the process of translating patent documents, a bilingual lexicon of technical terms is inevitable knowledge source. It is important to develop techniques of acquiring technical term translation equivalent pairs automatically from parallel patent documents. We take an approach of utilizing the phrase table of a state-of-theart phrase-based statistical machine translation model. First, we coll...

متن کامل

Identifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families

In the task of acquiring Japanese-Chinese technical term translation equivalent pairs from parallel patent documents, this paper considers situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents and studies the issue of identifying synonymous translation equivalent pairs. First, we collect candidates of synonymous trans...

متن کامل

Collecting Bilingual Technical Terms from Patent Families of Character-Segmented Chinese Sentences and Morpheme-Segmented Japanese Sentences

In manual translation of patent documents, a technical term bilingual lexicon is inevitable for a translator to efficiently translate patent documents. Dong et al. (2015) proposed a method of generating bilingual technical term lexicon from morpheme-segmented parallel patent sentences. The proposed method estimates Japanese-Chinese translation of technical terms using the phrase translation tab...

متن کامل

Compositional Translation of Technical Terms by Integrating Patent Families as a Parallel Corpus and a Comparable Corpus

In the previous methods of generating bilingual lexicon from parallel patent sentences extracted from patent families, the portion from which parallel patent sentences are extracted is about 30% out of the whole “Background” and “Embodiment” parts and about 70% are not used. Considering this situation, this paper proposes to generate bilingual lexicon for technical terms not only from the 30% b...

متن کامل

Integrating a Phrase-based SMT Model and a Bilingual Lexicon for Human in Semi-Automatic Acquisition of Technical Term Translation Lexicon

This paper presents an attempt at developing a technique of acquiring translation pairs of technical terms with sufficiently high precision from parallel patent documents. The approach taken in the proposed technique is based on integrating the phrase translation table of a state-of-the-art statistical phrasebased machine translation model, and compositional translation generation based on an e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011